This thesis presents a broad-coverage probabilistic top-down parser, and itsapplication to the problem of language modeling for speech recognition. Theparser builds fully connected derivations incrementally, in a single pass fromleft-to-right across the string. We argue that the parsing approach that wehave adopted is well-motivated from a psycholinguistic perspective, as a modelthat captures probabilistic dependencies between lexical items, as part of theprocess of building connected syntactic structures. The basic parser andconditional probability models are presented, and empirical results areprovided for its parsing accuracy on both newspaper text and spontaneoustelephone conversations. Modifications to the probability model are presentedthat lead to improved performance. A new language model which uses the outputof the parser is then defined. Perplexity and word error rate reduction aredemonstrated over trigram models, even when the trigram is trained onsignificantly more data. Interpolation on a word-by-word basis with a trigrammodel yields additional improvements.
展开▼